Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

نویسندگان

  • Robert Grandl
  • Srikanth Kandula
  • Sriram Rao
  • Aditya Akella
  • Janardhan Kulkarni
چکیده

We present DagPS, a scheduler that improves cluster utilization and job completion times by packing tasks with multi-resource requirements and inter-dependencies. While the underlying scheduling problem is intractable in general, DagPS is nearly optimal on the job DAGs that appear in production clusters at a large enterprise. Our key insight is that carefully handling the long-running tasks and those with tough-to-pack resource requirements will lead to good schedules for DAGs. However, which subset of tasks to treat carefully is a priori unclear. DagPS oòers a novel search procedure that evaluates various possibilities and outputs a valid schedule. An online component enforces the schedules desired by the various jobs running on the cluster. In addition, it packs tasks and, for any desired fairness scheme, guarantees bounded unfairness. We evaluate DagPS on a 200 server cluster using traces of over 20,000 DAGs collected from a large production cluster. Relative to the state-of-the art schedulers, DagPS speeds up half of the jobs by over 30%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Bi-objective Genetic Algorithm for the Single Batch-Processing Machine Scheduling Problem with Sequence Dependent Family Setup Time and Non-identical Job Sizes

This paper considers the problem of minimizing make-span and maximum tardiness simultaneously for scheduling jobs under non-identical job sizes, dynamic job arrivals, incompatible job families,and sequence-dependentfamily setup time on the single batch- processor, where split size of jobs is allowed between batches. At first, a new Mixed Integer Linear Programming (MILP) model is proposed for t...

متن کامل

A heuristic approach for multi-stage sequence-dependent group scheduling problems

We present several heuristic algorithms based on tabu search for solving the multi-stage sequence-dependent group scheduling (SDGS) problem by considering minimization of makespan as the criterion. As the problem is recognized to be strongly NP-hard, several meta (tabu) search-based solution algorithms are developed to efficiently solve industry-size problem instances. Also, two different initi...

متن کامل

Single-machine scheduling considering carryover sequence-dependent setup time, and earliness and tardiness penalties of production

Production scheduling is one of the very important problems that industry and production are confronted with it. Production scheduling is often planned in the industrial environments while productivity in production can improve significantly the expansion of simultaneous optimization of the scheduling plan. Production scheduling and production are two areas that have attracted much attention in...

متن کامل

A novel hybrid genetic algorithm to solve the make-to-order sequence-dependent flow-shop scheduling problem

Flow-shop scheduling problem (FSP) deals with the scheduling of a set of n jobs that visit a set of m machines in the same order. As the FSP is NP-hard, there is no efficient algorithm to reach the optimal solution of the problem. To minimize the holding, delay and setup costs of large permutation flow-shop scheduling problems with sequence-dependent setup times on each machine, this pap...

متن کامل

A genetic algorithm-based job scheduling model for big data analytics

Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1604.07371  شماره 

صفحات  -

تاریخ انتشار 2016